Skip to content

nfd-worker: Watch features.d changes #2156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ozhuraki
Copy link
Contributor

Closes: #2075

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 12, 2025
Copy link

netlify bot commented May 12, 2025

Deploy Preview for kubernetes-sigs-nfd ready!

Name Link
🔨 Latest commit b9ea21a
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-nfd/deploys/689df2d5da4e1600083e3e28
😎 Deploy Preview https://deploy-preview-2156--kubernetes-sigs-nfd.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ozhuraki
Once this PR has been reviewed and has the lgtm label, please assign marquiz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 12, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @ozhuraki. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 12, 2025
Copy link
Contributor

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ozhuraki for taking a stab at this.

I think we should refactor the code and re-architecture this more to make the code more maintainable. There might be other sources we'd also make react to events in a similar way. Basically, it should be the source (source/local in this case) which should be able to notify the main event loop that features have been updated. Also, no need to run re-discovery of all features.

@ozhuraki
Copy link
Contributor Author

@marquiz

Thanks, makes sense. I will move this into source/local.

@ozhuraki
Copy link
Contributor Author

@marquiz

Moved into source/local, please take a look

@ArangoGutierrez ArangoGutierrez requested a review from Copilot May 28, 2025 05:58
Copilot

This comment was marked as outdated.

@ozhuraki
Copy link
Contributor Author

ozhuraki commented Jun 3, 2025

@ArangoGutierrez

Thanks, updated, please take a look.

Copy link
Contributor

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Progress in the right direction, but I maintain that we should aim for a more generic, maintainable solution. For example, who knows in the future we might want to do some uevent-based stuff or similar and it would be good to have the basics right for that, instead building of pile of one-off tricks.

Some specific observations:

  • We operate on interfaces in nfd-worker, IMO we better keep that to keep the design cleaner. E.g. introduce a new AsyncSource, EventSource or smth with a method to set the event channel, and then when configuring/enabling the feature sources check if the source implements the interface and if it does call the method
  • It should be the source/local who is internally setting up the the fswatcher and notifies nfd-worker. Then, we have two possibilities here:
    • either nfd-worker does the source.Discover() and then advertises the updated features/labels
    • or the source runs discovery internally and notifies nfd-worker just to re-advertise update features
  • When a source notifies the nfd-worker main loop, the main loop does not need to do full re-discovery of all feature sources
  • Some unit test for the local source would be nide 😊

if err != nil {
if !os.IsNotExist(err) {
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it does not exist we probably want to exit, too (return nil)?

@marquiz
Copy link
Contributor

marquiz commented Jul 29, 2025

ping @ozhuraki any update on this?

Signed-off-by: Oleg Zhurakivskyy <[email protected]>
@ArangoGutierrez
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 14, 2025
Copilot

This comment was marked as outdated.

@ArangoGutierrez
Copy link
Contributor

/test pull-node-feature-discovery-verify-master

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 14, 2025
@ArangoGutierrez
Copy link
Contributor

/retest

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements filesystem watching functionality for the nfd-worker to automatically detect changes in the features.d directory. It introduces an EventSource interface to enable sources to send notifications when their underlying data changes, specifically targeting the local source to watch for file modifications.

  • Adds EventSource interface and related infrastructure for event-driven feature discovery
  • Implements filesystem watching in the local source using fsnotify
  • Integrates event handling into the nfd-worker main loop to trigger selective feature discovery

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
source/source.go Adds EventSource interface and GetAllEventSources() function
source/local/local.go Implements EventSource interface with fsnotify-based file watching
pkg/nfd-worker/nfd-worker.go Integrates event handling and selective feature discovery by source

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

ch <- s.Name()
}
case err := <-s.fsWatcher.Errors:
klog.ErrorS(err, "failed to to watch features.d changes")
Copy link
Preview

Copilot AI Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a duplicate 'to' in the error message. It should be "failed to watch features.d changes".

Suggested change
klog.ErrorS(err, "failed to to watch features.d changes")
klog.ErrorS(err, "failed to watch features.d changes")

Copilot uses AI. Check for mistakes.

case err := <-s.fsWatcher.Errors:
klog.ErrorS(err, "failed to to watch features.d changes")
}
time.Sleep(1 * time.Second)
Copy link
Preview

Copilot AI Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded 1-second sleep in the event loop may cause unnecessary delays in event processing. Consider removing this sleep or making it configurable, as fsnotify events should be processed immediately.

Suggested change
time.Sleep(1 * time.Second)

Copilot uses AI. Check for mistakes.

}

go s.runNotifier(ch)

Copy link
Preview

Copilot AI Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goroutine is started unconditionally, even when no filesystem watcher is created (when the directory doesn't exist). This will create a goroutine that only handles errors from a nil watcher, which could lead to unexpected behavior.

Suggested change
go s.runNotifier(ch)
}

Copilot uses AI. Check for mistakes.

@@ -121,6 +121,7 @@ type nfdWorker struct {
k8sClient k8sclient.Interface
nfdClient nfdclient.Interface
stop chan struct{} // channel for signaling stop
sourceEvent chan string // channel for events from soures
Copy link
Preview

Copilot AI Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a spelling error in the comment. 'soures' should be 'sources'.

Suggested change
sourceEvent chan string // channel for events from soures
sourceEvent chan string // channel for events from sources

Copilot uses AI. Check for mistakes.

w.sourceEvent = make(chan string)
eventSources := source.GetAllEventSources()
for _, s := range eventSources {
s.SetNotifyChannel(w.sourceEvent)
Copy link
Preview

Copilot AI Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error returned by SetNotifyChannel() is not being handled. If setting up the notification channel fails for any source, it could lead to silent failures in event monitoring.

Suggested change
s.SetNotifyChannel(w.sourceEvent)
if err := s.SetNotifyChannel(w.sourceEvent); err != nil {
klog.ErrorS(err, "failed to set notify channel for event source", "source", s.Name())
return fmt.Errorf("failed to set notify channel for event source %s: %w", s.Name(), err)
}

Copilot uses AI. Check for mistakes.

@k8s-ci-robot
Copy link
Contributor

@ozhuraki: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-node-feature-discovery-verify-master b9ea21a link true /test pull-node-feature-discovery-verify-master
pull-node-feature-discovery-e2e-test-master b9ea21a link true /test pull-node-feature-discovery-e2e-test-master
pull-node-feature-discovery-build-image-cross-generic b9ea21a link true /test pull-node-feature-discovery-build-image-cross-generic

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Watch features.d changes
4 participants